Skip to content

Zero-initialize parent_cpstate in analyze_cypher#2423

Merged
jrgemignani merged 1 commit into
apache:masterfrom
hari90:claude/fix-cypher-parser-memory-J8LsQ
May 4, 2026
Merged

Zero-initialize parent_cpstate in analyze_cypher#2423
jrgemignani merged 1 commit into
apache:masterfrom
hari90:claude/fix-cypher-parser-memory-J8LsQ

Conversation

@hari90
Copy link
Copy Markdown
Contributor

@hari90 hari90 commented May 2, 2026

cypher_parsestate parent_cpstate is declared on the stack in analyze_cypher() and only pstate, graph_name, and params are explicitly set before it is passed to make_cypher_parsestate(). The latter reads parent_cpstate->subquery_where_flag (and other fields) in cypher_parse_node.c, which leaves them with indeterminate values. Yugabyte UBSan flagged the garbage bool (value 8) and aborted the backend.

Use MemSet to zero the struct before populating the explicit fields so all remaining members start with a defined value.

cypher_parsestate parent_cpstate is declared on the stack in
analyze_cypher() and only pstate is explicitly set before it is passed
to make_cypher_parsestate(). The latter reads
parent_cpstate->subquery_where_flag (and other fields) in
cypher_parse_node.c, which leaves them with indeterminate values. UBSan
flagged the garbage bool (value 8) and aborted the backend.

Use MemSet to zero the struct before populating pstate so all remaining
members start with a defined value.
@hari90 hari90 force-pushed the claude/fix-cypher-parser-memory-J8LsQ branch from c6a9b79 to e30e73c Compare May 3, 2026 05:25
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an uninitialized-stack-memory bug in the Cypher analysis path by ensuring the temporary cypher_parsestate wrapper passed into make_cypher_parsestate() starts from a fully defined state, preventing UBSan failures and undefined behavior.

Changes:

  • Zero-initialize parent_cpstate with MemSet() in analyze_cypher() before setting its explicitly required fields.
  • Remove now-redundant explicit NULL assignments for members covered by the zero-initialization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jrgemignani jrgemignani merged commit e9ef30b into apache:master May 4, 2026
10 checks passed
@hari90 hari90 deleted the claude/fix-cypher-parser-memory-J8LsQ branch May 4, 2026 18:49
@hari90
Copy link
Copy Markdown
Contributor Author

hari90 commented May 4, 2026

Thank you

hari90 added a commit to yugabyte/yugabyte-db that referenced this pull request May 4, 2026
## Summary

Fixes #31404. Zero-initialize the stack-allocated `parent_cpstate` in
`analyze_cypher()` before passing it to `make_cypher_parsestate()`.

The struct previously had only `pstate`, `graph_name`, and `params` set,
leaving `subquery_where_flag` (and other fields) as uninitialized stack
memory. `make_cypher_parsestate()` reads
`parent_cpstate->subquery_where_flag`
directly (`cypher_parse_node.c:62`), so under UndefinedBehaviorSanitizer
the
load aborts the backend whenever the stack byte is something other than
0 or
1. The first `cypher()` call in the asan/ubsan build was reliably
crashing
with:

```
runtime error: load of value 8, which is not a valid value for type 'bool'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior cypher_parse_node.c:62:56
```

This is a latent upstream Apache AGE bug; the file is a verbatim subtree
import. Worth reporting upstream as well.

The redundant `graph_name = NULL`/`params = NULL` assignments are
removed
since `memset` already zeros them.

## Test plan

- `./yb_build.sh asan --java-test
'org.yb.pgsql.TestPgRegressThirdPartyExtensionsMage#schedule'`
  was failing on `yb.orig.basic` before this change and passes with it.

Upstream fix apache/age#2423

---

[CSI](<https://csiweb.dev.yugabyte.com/pull/31405/>)
hari90 added a commit to yugabyte/yugabyte-db that referenced this pull request May 5, 2026
…#31405) (#31423)

## Summary

Fixes #31404. Zero-initialize the stack-allocated `parent_cpstate` in
`analyze_cypher()` before passing it to `make_cypher_parsestate()`.

The struct previously had only `pstate`, `graph_name`, and `params` set,
leaving `subquery_where_flag` (and other fields) as uninitialized stack
memory. `make_cypher_parsestate()` reads
`parent_cpstate->subquery_where_flag`
directly (`cypher_parse_node.c:62`), so under UndefinedBehaviorSanitizer
the
load aborts the backend whenever the stack byte is something other than
0 or
1. The first `cypher()` call in the asan/ubsan build was reliably
crashing
with:

```
runtime error: load of value 8, which is not a valid value for type 'bool'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior cypher_parse_node.c:62:56
```

This is a latent upstream Apache AGE bug; the file is a verbatim subtree
import. Worth reporting upstream as well.

The redundant `graph_name = NULL`/`params = NULL` assignments are
removed
since `memset` already zeros them.

Original commit: 154e1a6 / #31405

## Test plan

- `./yb_build.sh asan --java-test
'org.yb.pgsql.TestPgRegressThirdPartyExtensionsMage#schedule'`
  was failing on `yb.orig.basic` before this change and passes with it.

Upstream fix apache/age#2423

---

[CSI](<https://csiweb.dev.yugabyte.com/pull/31423/>)
jrgemignani added a commit to jrgemignani/age that referenced this pull request Jun 1, 2026
Targets the agtype value-tree allocation cost that dominates LDBC IC4 and
other sort/comparator-bound queries. Two complementary optimizations
(A1 arena, A2 fast-id) plus the sanitizer issues uncovered while
verifying the change.

A1 — per-tuple agtype arena (Pattern B applied):
  Add agt_arena_begin / AGT_ARENA_BEGIN_SHARED / agt_arena_reset /
  agt_arena_end helpers (agtype_util.c, agtype.h). Two usage patterns:
  disposable per-call arenas for one-shot tree builds, and a long-lived
  shared arena (file-static, parented under CacheMemoryContext) for hot
  inner loops that would otherwise pay AllocSetCreate cost per call.

  Apply to compare_agtype_scalar_containers: when the comparator
  deserializes VERTEX/EDGE/PATH composites via fill_agtype_value_no_copy,
  route those allocations into a shared arena and reset (O(1)) at the
  end of the comparison instead of recursively pfree'ing the tree (O(N)).
  Master IC4 spent 17.64% of total CPU in pfree_agtype_value_content from
  this comparator; the arena collapses that walk to one MemoryContextReset.

A2 — binary-direct id extraction:
  When compare_agtype_scalar_containers compares two VERTEX or two EDGE
  values, only the graphid 'id' determines order (compare_agtype_scalar_values
  ignores all other fields). Master IC4 spent 73.48% inclusive in
  ag_deserialize_composite called from this comparator, building the entire
  agtype_value tree (label, properties, nested values) just to read one int64.

  Add extract_composite_id_fast: walks the binary VERTEX/EDGE container
  directly and reads the int64 id at field index 0 (always 'id' due to
  length-sorted key ordering established by uniqueify_agtype_object — the
  same invariant PR apache#2302 leveraged for direct array indexing).

  Wired into compare_agtype_scalar_containers as a fast path before the
  existing arena+slow path. Falls through on cross-type comparisons
  (VERTEX vs EDGE), PATH, and malformed extended-type entries.

ASAN/UBSan fixes (10 issues found while verifying A1+A2 under
-fsanitize=address -fsanitize=undefined; 1 introduced by A2, 9 pre-existing).
The 11th finding (uninitialized parent_cpstate fields in cypher_analyze.c)
was independently fixed upstream by PR apache#2423; this branch is rebased on
top of that fix and no longer carries a duplicate change.
  - HIGH: heap-buffer-overflow in agtype_raw.c:write_container — VARSIZE
    included the 4-byte varlena header but copy started past it, reading
    VARHDRSZ bytes past source allocation. Subtract VARHDRSZ. ASan flagged
    4 times in installcheck.
  - 7 unaligned int64/float8 reads/writes (AGT_HEADER is 4 bytes, leaving
    payload 4-byte- but not 8-byte-aligned). UB under strict-alignment
    rules. Replaced typed loads/stores with memcpy in agtype_ext.c (4 sites),
    agtype_util.c (3 sites incl. extract_composite_id_fast), agtype_raw.c
    (write_graphid).
  - agtype.c:agtype_hash_cmp: reading r->val.array.raw_scalar at
    WAGT_END_ARRAY where the iterator does not populate *val. Replaced
    with explicit raw_scalar bool stack tracked across BEGIN/END pairs.
    Hash output unchanged.
  - agtype_util.c:copy_to_buffer: memcpy(NULL, NULL, 0) is UB. Guard with
    if (len > 0).

Performance (SF3 LDBC, paired same-session vs master):
  IC sum:   201,930 -> 157,129 ms = -22.19%
  IC4:       50,570 ->   7,615 ms = -84.94%
  IC5:       21,779 ->  20,646 ms =  -5.20%
  IS, IU:   within noise (<= +/-3% on sub-second queries)

Tests:
  34/34 installcheck on production PG18.3
  34/34 installcheck on ASAN+UBSan PG18.3 (zero unique runtime errors,
        zero heap-buffer-overflows; the original audit found 10 unique
        runtime issues plus 1 HBO with 4 instances, all addressed here
        or upstream)

Co-authored-by: Claude <noreply@anthropic.com>

modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/adt/agtype_ext.c
modified:   src/backend/utils/adt/agtype_raw.c
modified:   src/backend/utils/adt/agtype_util.c
modified:   src/include/utils/agtype.h
jrgemignani added a commit to jrgemignani/age that referenced this pull request Jun 1, 2026
Targets the agtype value-tree allocation cost that dominates LDBC IC4 and
other sort/comparator-bound queries. Two complementary optimizations
(A1 arena, A2 fast-id) plus the sanitizer issues uncovered while
verifying the change.

A1 — per-tuple agtype arena (Pattern B applied):
  Add agt_arena_begin / AGT_ARENA_BEGIN_SHARED / agt_arena_reset /
  agt_arena_end helpers (agtype_util.c, agtype.h). Two usage patterns:
  disposable per-call arenas for one-shot tree builds, and a long-lived
  shared arena (file-static, parented under CacheMemoryContext) for hot
  inner loops that would otherwise pay AllocSetCreate cost per call.

  Apply to compare_agtype_scalar_containers: when the comparator
  deserializes VERTEX/EDGE/PATH composites via fill_agtype_value_no_copy,
  route those allocations into a shared arena and reset (O(1)) at the
  end of the comparison instead of recursively pfree'ing the tree (O(N)).
  Master IC4 spent 17.64% of total CPU in pfree_agtype_value_content from
  this comparator; the arena collapses that walk to one MemoryContextReset.

A2 — binary-direct id extraction:
  When compare_agtype_scalar_containers compares two VERTEX or two EDGE
  values, only the graphid 'id' determines order (compare_agtype_scalar_values
  ignores all other fields). Master IC4 spent 73.48% inclusive in
  ag_deserialize_composite called from this comparator, building the entire
  agtype_value tree (label, properties, nested values) just to read one int64.

  Add extract_composite_id_fast: walks the binary VERTEX/EDGE container
  directly and reads the int64 id at field index 0 (always 'id' due to
  length-sorted key ordering established by uniqueify_agtype_object — the
  same invariant PR apache#2302 leveraged for direct array indexing).

  Wired into compare_agtype_scalar_containers as a fast path before the
  existing arena+slow path. Falls through on cross-type comparisons
  (VERTEX vs EDGE), PATH, and malformed extended-type entries.

ASAN/UBSan fixes (10 issues found while verifying A1+A2 under
-fsanitize=address -fsanitize=undefined; 1 introduced by A2, 9 pre-existing).
The 11th finding (uninitialized parent_cpstate fields in cypher_analyze.c)
was independently fixed upstream by PR apache#2423; this branch is rebased on
top of that fix and no longer carries a duplicate change.
  - HIGH: heap-buffer-overflow in agtype_raw.c:write_container — VARSIZE
    included the 4-byte varlena header but copy started past it, reading
    VARHDRSZ bytes past source allocation. Subtract VARHDRSZ. ASan flagged
    4 times in installcheck.
  - 7 unaligned int64/float8 reads/writes (AGT_HEADER is 4 bytes, leaving
    payload 4-byte- but not 8-byte-aligned). UB under strict-alignment
    rules. Replaced typed loads/stores with memcpy in agtype_ext.c (4 sites),
    agtype_util.c (3 sites incl. extract_composite_id_fast), agtype_raw.c
    (write_graphid).
  - agtype.c:agtype_hash_cmp: reading r->val.array.raw_scalar at
    WAGT_END_ARRAY where the iterator does not populate *val. Replaced
    with explicit raw_scalar bool stack tracked across BEGIN/END pairs.
    Hash output unchanged.
  - agtype_util.c:copy_to_buffer: memcpy(NULL, NULL, 0) is UB. Guard with
    if (len > 0).

Performance (SF3 LDBC, paired same-session vs master):
  IC sum:   201,930 -> 157,129 ms = -22.19%
  IC4:       50,570 ->   7,615 ms = -84.94%
  IC5:       21,779 ->  20,646 ms =  -5.20%
  IS, IU:   within noise (<= +/-3% on sub-second queries)

Tests:
  34/34 installcheck on production PG18.3
  34/34 installcheck on ASAN+UBSan PG18.3 (zero unique runtime errors,
        zero heap-buffer-overflows; the original audit found 10 unique
        runtime issues plus 1 HBO with 4 instances, all addressed here
        or upstream)

Co-authored-by: Claude <noreply@anthropic.com>

modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/adt/agtype_ext.c
modified:   src/backend/utils/adt/agtype_raw.c
modified:   src/backend/utils/adt/agtype_util.c
modified:   src/include/utils/agtype.h
jrgemignani added a commit to jrgemignani/age that referenced this pull request Jun 1, 2026
Targets the agtype value-tree allocation cost that dominates LDBC IC4 and
other sort/comparator-bound queries. Two complementary optimizations
(A1 arena, A2 fast-id) plus the sanitizer issues uncovered while
verifying the change.

A1 — per-tuple agtype arena (Pattern B applied):
  Add agt_arena_begin / AGT_ARENA_BEGIN_SHARED / agt_arena_reset /
  agt_arena_end helpers (agtype_util.c, agtype.h). Two usage patterns:
  disposable per-call arenas for one-shot tree builds, and a long-lived
  shared arena (file-static, parented under CacheMemoryContext) for hot
  inner loops that would otherwise pay AllocSetCreate cost per call.

  Apply to compare_agtype_scalar_containers: when the comparator
  deserializes VERTEX/EDGE/PATH composites via fill_agtype_value_no_copy,
  route those allocations into a shared arena and reset (O(1)) at the
  end of the comparison instead of recursively pfree'ing the tree (O(N)).
  Master IC4 spent 17.64% of total CPU in pfree_agtype_value_content from
  this comparator; the arena collapses that walk to one MemoryContextReset.

A2 — binary-direct id extraction:
  When compare_agtype_scalar_containers compares two VERTEX or two EDGE
  values, only the graphid 'id' determines order (compare_agtype_scalar_values
  ignores all other fields). Master IC4 spent 73.48% inclusive in
  ag_deserialize_composite called from this comparator, building the entire
  agtype_value tree (label, properties, nested values) just to read one int64.

  Add extract_composite_id_fast: walks the binary VERTEX/EDGE container
  directly and reads the int64 id at field index 0 (always 'id' due to
  length-sorted key ordering established by uniqueify_agtype_object — the
  same invariant PR apache#2302 leveraged for direct array indexing).

  Wired into compare_agtype_scalar_containers as a fast path before the
  existing arena+slow path. Falls through on cross-type comparisons
  (VERTEX vs EDGE), PATH, and malformed extended-type entries.

ASAN/UBSan fixes (10 issues found while verifying A1+A2 under
-fsanitize=address -fsanitize=undefined; 1 introduced by A2, 9 pre-existing).
The 11th finding (uninitialized parent_cpstate fields in cypher_analyze.c)
was independently fixed upstream by PR apache#2423; this branch is rebased on
top of that fix and no longer carries a duplicate change.
  - HIGH: heap-buffer-overflow in agtype_raw.c:write_container — VARSIZE
    included the 4-byte varlena header but copy started past it, reading
    VARHDRSZ bytes past source allocation. Subtract VARHDRSZ. ASan flagged
    4 times in installcheck.
  - 7 unaligned int64/float8 reads/writes (AGT_HEADER is 4 bytes, leaving
    payload 4-byte- but not 8-byte-aligned). UB under strict-alignment
    rules. Replaced typed loads/stores with memcpy in agtype_ext.c (4 sites),
    agtype_util.c (3 sites incl. extract_composite_id_fast), agtype_raw.c
    (write_graphid).
  - agtype.c:agtype_hash_cmp: reading r->val.array.raw_scalar at
    WAGT_END_ARRAY where the iterator does not populate *val. Replaced
    with explicit raw_scalar bool stack tracked across BEGIN/END pairs.
    Hash output unchanged.
  - agtype_util.c:copy_to_buffer: memcpy(NULL, NULL, 0) is UB. Guard with
    if (len > 0).

Performance (SF3 LDBC, paired same-session vs master):
  IC sum:   201,930 -> 157,129 ms = -22.19%
  IC4:       50,570 ->   7,615 ms = -84.94%
  IC5:       21,779 ->  20,646 ms =  -5.20%
  IS, IU:   within noise (<= +/-3% on sub-second queries)

Tests:
  34/34 installcheck on production PG18.3
  34/34 installcheck on ASAN+UBSan PG18.3 (zero unique runtime errors,
        zero heap-buffer-overflows; the original audit found 10 unique
        runtime issues plus 1 HBO with 4 instances, all addressed here
        or upstream)

Co-authored-by: Claude noreply@anthropic.com

modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/adt/agtype_ext.c
modified:   src/backend/utils/adt/agtype_raw.c
modified:   src/backend/utils/adt/agtype_util.c
modified:   src/include/utils/agtype.h
jrgemignani added a commit that referenced this pull request Jun 1, 2026
Targets the agtype value-tree allocation cost that dominates LDBC IC4 and
other sort/comparator-bound queries. Two complementary optimizations
(A1 arena, A2 fast-id) plus the sanitizer issues uncovered while
verifying the change.

A1 — per-tuple agtype arena (Pattern B applied):
  Add agt_arena_begin / AGT_ARENA_BEGIN_SHARED / agt_arena_reset /
  agt_arena_end helpers (agtype_util.c, agtype.h). Two usage patterns:
  disposable per-call arenas for one-shot tree builds, and a long-lived
  shared arena (file-static, parented under CacheMemoryContext) for hot
  inner loops that would otherwise pay AllocSetCreate cost per call.

  Apply to compare_agtype_scalar_containers: when the comparator
  deserializes VERTEX/EDGE/PATH composites via fill_agtype_value_no_copy,
  route those allocations into a shared arena and reset (O(1)) at the
  end of the comparison instead of recursively pfree'ing the tree (O(N)).
  Master IC4 spent 17.64% of total CPU in pfree_agtype_value_content from
  this comparator; the arena collapses that walk to one MemoryContextReset.

A2 — binary-direct id extraction:
  When compare_agtype_scalar_containers compares two VERTEX or two EDGE
  values, only the graphid 'id' determines order (compare_agtype_scalar_values
  ignores all other fields). Master IC4 spent 73.48% inclusive in
  ag_deserialize_composite called from this comparator, building the entire
  agtype_value tree (label, properties, nested values) just to read one int64.

  Add extract_composite_id_fast: walks the binary VERTEX/EDGE container
  directly and reads the int64 id at field index 0 (always 'id' due to
  length-sorted key ordering established by uniqueify_agtype_object — the
  same invariant PR #2302 leveraged for direct array indexing).

  Wired into compare_agtype_scalar_containers as a fast path before the
  existing arena+slow path. Falls through on cross-type comparisons
  (VERTEX vs EDGE), PATH, and malformed extended-type entries.

ASAN/UBSan fixes (10 issues found while verifying A1+A2 under
-fsanitize=address -fsanitize=undefined; 1 introduced by A2, 9 pre-existing).
The 11th finding (uninitialized parent_cpstate fields in cypher_analyze.c)
was independently fixed upstream by PR #2423; this branch is rebased on
top of that fix and no longer carries a duplicate change.
  - HIGH: heap-buffer-overflow in agtype_raw.c:write_container — VARSIZE
    included the 4-byte varlena header but copy started past it, reading
    VARHDRSZ bytes past source allocation. Subtract VARHDRSZ. ASan flagged
    4 times in installcheck.
  - 7 unaligned int64/float8 reads/writes (AGT_HEADER is 4 bytes, leaving
    payload 4-byte- but not 8-byte-aligned). UB under strict-alignment
    rules. Replaced typed loads/stores with memcpy in agtype_ext.c (4 sites),
    agtype_util.c (3 sites incl. extract_composite_id_fast), agtype_raw.c
    (write_graphid).
  - agtype.c:agtype_hash_cmp: reading r->val.array.raw_scalar at
    WAGT_END_ARRAY where the iterator does not populate *val. Replaced
    with explicit raw_scalar bool stack tracked across BEGIN/END pairs.
    Hash output unchanged.
  - agtype_util.c:copy_to_buffer: memcpy(NULL, NULL, 0) is UB. Guard with
    if (len > 0).

Performance (SF3 LDBC, paired same-session vs master):
  IC sum:   201,930 -> 157,129 ms = -22.19%
  IC4:       50,570 ->   7,615 ms = -84.94%
  IC5:       21,779 ->  20,646 ms =  -5.20%
  IS, IU:   within noise (<= +/-3% on sub-second queries)

Tests:
  34/34 installcheck on production PG18.3
  34/34 installcheck on ASAN+UBSan PG18.3 (zero unique runtime errors,
        zero heap-buffer-overflows; the original audit found 10 unique
        runtime issues plus 1 HBO with 4 instances, all addressed here
        or upstream)

Co-authored-by: Claude noreply@anthropic.com

modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/adt/agtype_ext.c
modified:   src/backend/utils/adt/agtype_raw.c
modified:   src/backend/utils/adt/agtype_util.c
modified:   src/include/utils/agtype.h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants